Prospect Loan Analysis by MUKUL KOMMABATHULA
========================================================
Prosper Marketplace is America’s first peer-to-peer lending marketplace, with over $7 billion in funded loans. Borrowers request personal loans on Prosper and investors (individual or institutional) can fund anywhere from $2,000 to $35,000 per loan request. Investors can consider borrowers’ credit scores, ratings, and histories and the category of the loan. Prosper handles the servicing of the loan and collects and distributes borrower payments and interest back to the loan investors.
Prosper verifies borrowers’ identities and select personal data before funding loans and manages all stages of loan servicing. Prosper’s unsecured personal loans are fully amortized over a period of three or five years, with no pre-payment penalties. Prosper generates revenue by collecting a one-time fee on funded loans from borrowers and assessing an annual loan servicing fee to investors.
## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : chr "1021339766868145413AB3B" "10273602499503308B223C1" "0EE9337825851032864889A" "0EF5356002482715299901A" ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : chr "2007-08-26 19:09:29.263000000" "2014-02-27 08:28:07.900000000" "2007-01-05 15:00:47.090000000" "2012-10-22 11:02:35.010000000" ...
## $ CreditGrade : chr "C" "" "HR" "" ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : chr "Completed" "Current" "Completed" "Current" ...
## $ ClosedDate : chr "2009-08-14 00:00:00" "" "2009-12-17 00:00:00" "" ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : chr "" "A" "" "A" ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : chr "CO" "CO" "GA" "GA" ...
## $ Occupation : chr "Other" "Professional" "Other" "Skilled Labor" ...
## $ EmploymentStatus : chr "Self-employed" "Employed" "Not available" "Employed" ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : chr "True" "False" "False" "True" ...
## $ CurrentlyInGroup : chr "True" "False" "True" "False" ...
## $ GroupKey : chr "" "" "783C3371218786870A73D20" "" ...
## $ DateCreditPulled : chr "2007-08-26 18:41:46.780000000" "2014-02-27 08:28:14" "2007-01-02 14:09:10.060000000" "2012-10-22 11:02:32" ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : chr "2001-10-11 00:00:00" "1996-03-18 00:00:00" "2002-07-27 00:00:00" "1983-02-28 00:00:00" ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : chr "$25,000-49,999" "$50,000-74,999" "Not displayed" "$25,000-49,999" ...
## $ IncomeVerifiable : chr "True" "True" "True" "True" ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : chr "E33A3400205839220442E84" "9E3B37071505919926B1D82" "6954337960046817851BCB2" "A0393664465886295619C51" ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : chr "2007-09-12 00:00:00" "2014-03-03 00:00:00" "2007-01-17 00:00:00" "2012-11-01 00:00:00" ...
## $ LoanOriginationQuarter : chr "Q3 2007" "Q1 2014" "Q1 2007" "Q4 2012" ...
## $ MemberKey : chr "1F3E3376408759268057EDA" "1D13370546739025387B2F4" "5F7033715035555618FA612" "9ADE356069835475068C6D2" ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
- 36 seems to be the Term with the highest month that borrower
chooses
# Looking into the percantage of the borrowers
tblFun <- function(x){
tbl <- table(Loan_data$LoanOriginationYear)
res <- cbind(tbl,round(prop.table(tbl)*100,2))
colnames(res) <- c('Number Of Borrowers','Percentage')
res
}
do.call(rbind,lapply(tips[0:1],tblFun))
## Number Of Borrowers Percentage
summary(Loan_data$BorrowerRate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
It seems that that the borrower interest rate is around 0.0 to 0.5. For most of the borrower the interst is less than 0.25
Let’s check the borrower who has the interest rate of zero
sum(Loan_data$BorrowerRate==0)
## [1] 8
There are 8 people with the interest rate of zero percent. Maybe the loan was given before 2009
Now we will explore what level of prosper reting is available
Loan_data$ProsperRating..Alpha. <- ordered(Loan_data$ProsperRating..Alpha.,
levels = c("AA","A","B","C","D","E","HR",""))
levels(Loan_data$ProsperRating..Alpha.)
## [1] "AA" "A" "B" "C" "D" "E" "HR" ""
table(Loan_data$ProsperRating..Alpha.)
##
## AA A B C D E HR
## 5372 14551 15581 18345 14274 9795 6935 29084
ggplot(aes(x = ProsperRating..Alpha.), data = Loan_data) +
geom_bar(fill = '#369b80',color = '#0542c4')
The shape of distipution seems like a bell shaped curve and the most common prosper ratings are A,B,C, and D.
Let’s check What purpose borrowers are taking loans for?!
# Create a new variable to display the full name Instead of a number for listing category
Loan_data$ListingCategory..string <- mapvalues(Loan_data$ListingCategory..numeric.,
from = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
16,17,18,19,20),
to = c("Not Available", "Debt Consolidation",
"Home Improvements", "Business",
"Personal Loan","Student Use","Auto",
"Other","Baby&Adoption","Boat",
"Cosmetic Procedure","Engagement Ring",
"Green Loans","Household Expenses",
"Large Purchases","Medical/Dental",
"MotorCycle","RV","Taxes","Vacation",
"Wedding Loans"))
# Create a table to explore the number of borrowers in each category
table(Loan_data$ListingCategory..string)
##
## Auto Baby&Adoption Boat Business
## 2572 199 85 7189
## Cosmetic Procedure Debt Consolidation Engagement Ring Green Loans
## 91 58308 217 59
## Home Improvements Household Expenses Large Purchases Medical/Dental
## 7433 1996 876 1522
## MotorCycle Not Available Other Personal Loan
## 304 16965 10494 2395
## RV Student Use Taxes Vacation
## 52 756 885 768
## Wedding Loans
## 771
We can see that most of the loan borrower are taking loan to clear their debt consolidation
Next to thedebt consolodation most people take loan Business and Home improvement
Lets Explore the geographical distribution for borrowers.
Prosper is a California based company. That might be the reason that there are more loans originated in this state.
Next mostly used states are FL, GA, IL, NY, and TX.
Exploring the range of loan amounts borrowers are requesting.
summary(Loan_data$LoanOriginalAmount)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
- The shape of distribution is positively skewed. Minimum loan amount is
1000 and maximum is 35000. Third quartile is 12000. There is a big
difference between Q3 and the max amount.
We can see that the majority of loans are below 10000
Now we will check borrowers monthly income
summary(Loan_data$StatedMonthlyIncome)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
. There seems to be an Outlier.
. I will change the x limits to see the graph closely.
People who have less monthly income are more likely to take loans. It is also interesting to see that there are people with zero monthly income. Even though, they managed to get the loan.
Let’s check the number of people who got loans with zero income.
# Lets check the number of borrowers with zero income
sum(Loan_data$StatedMonthlyIncome == 0)
## [1] 1394
. Total of 1394 people got loans with zero income. This group holds people with listing creation date after and before 2009. So there is no chance to think that thay are of some interest to lenders. It is interesting to see that all these people come under zero income or not employed. May be they have shown some property to get the loan or they are doing some other kind of job that doesn’t come in the category of monthly income.
table(Loan_data$IncomeRange)
##
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
ggplot(aes(x=IncomeRange), data=Loan_data) +
geom_bar(fill='#369b80') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5,hjust = 1))
. Most people with the income range from 25,000-74,999 took loans.
. Let’s look into the debt to income ratio graph.
summary(Loan_data$DebtToIncomeRatio)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
. To get a clear graph we will take the 99 percentile.
# Create a new variable for the 0.50, 0.90, 0.99 percentage to debt to ratio
debt_income_ratio <- subset(Loan_data, !is.na(DebtToIncomeRatio))
quantile(debt_income_ratio$DebtToIncomeRatio, c(0.5, 0.9, 0.99))
## 50% 90% 99%
## 0.22 0.42 0.86
. Now the graph seems to be much better. Almost 99% of the Debt to income ratio is less than 0.86. This is a good number because people cannot pay all of their income for their loan payments.
. Let’s investigate the number of people which thier debt to income ratio is greater than 1!
# Check number of borrowers with DebtToIncomeRatio > 1
table(Loan_data$DebtToIncomeRatio > 1)
##
## FALSE TRUE
## 104584 799
. Let’s look into their loans’ status.
Tip: Now that you’ve completed your univariate explorations, it’s time to reflect on and summarize what you’ve found. Use the questions below to help you gather your observations and add your own if you have other thoughts!
investigation into your feature(s) of interest?
Did you perform any operations on the data to tidy, adjust, or change
the form
of the data? If so, why did you do this?
Here, I setup a dataframe that contains variables that are of interest to further analyze.
# Subset a dataframe to Explore some variables
selected_df <- subset(Loan_data, select = c(BorrowerAPR,BorrowerRate, LenderYield,
ProsperRating..numeric.,CreditScoreRangeLower,CreditScoreRangeUpper,
CurrentCreditLines,OpenCreditLines,TotalCreditLinespast7years,
OpenRevolvingAccounts,TotalInquiries,AmountDelinquent,RevolvingCreditBalance,
BankcardUtilization,AvailableBankcardCredit,DebtToIncomeRatio,
LoanMonthsSinceOrigination,LoanOriginalAmount, MonthlyLoanPayment,Investors))
ggcorr(selected_df, hjust=0.95, size = 2.7, label = TRUE, label_size = 3, layout.exp = 3.5, color = 'black')
## Warning: Use of `Loan_data$ProsperRating..Alpha.` is discouraged.
## ℹ Use `ProsperRating..Alpha.` instead.
. As we can see that the borrower rate keeps on increasing as the ProsperRating keeps on decreasing.
. Now We will analyze on what basis prosper rating is given!
Loan_data$EmploymentStatus <- ordered(Loan_data$EmploymentStatus, levels = c("Not employed",
"Other","Self-employed", "Employed",
"Part-time","Retired","Full-time"))
ggplot(aes(x = EmploymentStatus), data = subset(Loan_data, !is.na(Loan_data$ProsperRating..numeric.))) +
geom_bar(aes(fill = ProsperRating..Alpha.), position = 'fill')
. It seems that employment status plays a role in determining prosper rating. Employed borrowers must have a better proper rating than not employed.
. We will see how income range influence prosper rating.
. It is clear that as income range is more prosper rating is better. That’s because they are comfortable to pay their debts on time.
. We will see how credit score influence prosper rating.
. As the credit score increases the prosper rating also increases
. Now we will see what factors influence credit score.
ggplot(aes(x = factor(CreditScoreRangeLower), y = CurrentCreditLines), data = subset(Loan_data, CreditScoreRangeLower>500)) +
geom_boxplot()
## Warning: Removed 5797 rows containing non-finite values (`stat_boxplot()`).
# Check the correlation between CreditScoreRangeLower and CurrentCreditLines
with(Loan_data, cor.test(CreditScoreRangeLower,CurrentCreditLines, method = "pearson"))
##
## Pearson's product-moment correlation
##
## data: CreditScoreRangeLower and CurrentCreditLines
## t = 46.809, df = 106331, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1361976 0.1479760
## sample estimates:
## cor
## 0.1420918
. It show us that the more the credit line the better is the credit score
#Let's look at corr bween CreditScoreRangeLower and TotalInquiries
with(Loan_data, cor.test(CreditScoreRangeLower, TotalInquiries))
##
## Pearson's product-moment correlation
##
## data: CreditScoreRangeLower and TotalInquiries
## t = -96.631, df = 112776, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2819071 -0.2711270
## sample estimates:
## cor
## -0.2765257
. The lesser the enquiries the better the score
# Check the correlation between BorrowerRate and CreditScoreRangeLower
with(Loan_data, cor.test(BorrowerRate,CreditScoreRangeLower, method = "pearson"))
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and CreditScoreRangeLower
## t = -175.17, df = 113344, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4661358 -0.4569730
## sample estimates:
## cor
## -0.4615667
. Good interest rates for higher credit score.
. Now we will see how monthly income, term and loan original amount are influenced by different factors!
# Plotting StatedMonthlyIncome by MonthlyLoanPayment
ggplot(aes(x = StatedMonthlyIncome, y = MonthlyLoanPayment), data = Loan_data) +
geom_point(alpha = 1/10, fill=I("#ea56b1"),color=I("black"),shape=21)+
geom_smooth(method = "lm", color = 'red') +
scale_x_continuous(limits = c(0, quantile(Loan_data$StatedMonthlyIncome, 0.95)))
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 5677 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 5677 rows containing missing values (`geom_point()`).
# Check the correlation between StatedMonthlyIncome and MonthlyLoanPayment
with(Loan_data, cor.test(StatedMonthlyIncome,MonthlyLoanPayment, method = "pearson"))
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and MonthlyLoanPayment
## t = 67.764, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1912423 0.2024055
## sample estimates:
## cor
## 0.1968303
. People who have more income are taking higher loans.
# Check the correlation between StatedMonthlyIncome and LoanOriginalAmount
with(Loan_data, cor.test(StatedMonthlyIncome,LoanOriginalAmount, method = "pearson"))
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and LoanOriginalAmount
## t = 69.353, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1956816 0.2068243
## sample estimates:
## cor
## 0.2012595
. The higher the income, the higher the loan amount taken.
# display borrowers' income range
table(Loan_data$IncomeRange)
##
## $0 $1-24,999 $25,000-49,999 $50,000-74,999 $75,000-99,999
## 621 7274 32192 31050 16916
## $100,000+
## 17337
. But as the income increases, number of people taking loan is decreasing. Is seems right because people with higher income will be self-sufficient and they may be do not need personal loans.
. People are taking higher loan amounts for debt consolidation and baby&adoption.
## List of 1
## $ axis.text.x:List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : num 0.5
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
. Term has influence over borrower rate.
# Check the correlation between LoanOriginalAmount and BorrowerRate
with(Loan_data, cor.test(LoanOriginalAmount,BorrowerRate, method = "pearson"))
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3341283 -0.3237719
## sample estimates:
## cor
## -0.3289599
. As loan amount increases, interest rates seem to be reasonable.
investigation. How did the feature(s) of interest vary with other
features in
the dataset?
Borrower rate is determined by prosper rating, credit score, loan original amount, and term. And there is a strong relationship between Borrower rate and credit score with R^2 -0.46. In turn, credit score is influenced by total inquiries, credit lines and monthly loan payments. And Loan original amount is influenced by term, employment status and listing category.
(not the main feature(s) of interest)?
. In this section, we will see how main factors are inter related.
. At the same level of prosper rating and credit score, higher the term implies borrowers have chance to apply for higher loan amount.
. We will see whether income influence loan amount. In bivariate analysis, we have seen that loan original amount and stated monthly income are related by R^2 of 0.2.
. Now we will see how they behave when term comes into the picture.
. Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans.
. Even if income earning are low, people have opportunity to take higher loan amounts when they choose to pay off in 5years. It seems reasonable because borrowers will have affordable monthly loan payments and their debt to income ration will be much more less than 1.
. Overall, all kinds of employment statuses can get higher loans but they have to choose higher term. But in the graph, we can definitely see that those who are employed are borrowing much more loan amount than others in each term group.
. We will see graph for loan original amount Vs income range.
. In this case also, borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more.
. In bivariate analysis, we have seen that higher loan original amount have better interest rates and they are related by R^2 of -0.33. But when term comes into picture, interest rates are a little higher.
In spite of the different levels of credit score, proper rating, employment status, and monthly income borrowers have opportunity to take higher levels of loan amounts. But they have to choose to payoff in more number of terms.
People who have more income are likely to take higher loan amount. When I further analyzed loan original amount with respect to borrower rate. People can borrower more money but when term comes into picture, interest rates are little higher.
Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans. People who have lower proper rating cannot take higher loans like $30,000 and they have to pay higher borrower rates even for less loan amounts. This trend seems quite normal because lenders are taking risk of giving loans to people who have bad prosper rating. So, lenders should get some benefit of higher interest rates. It seems similar to the stock market if one takes the risk they might get huge profit or loss.
From this Boxplot it is clear that borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more. And prosper is also making sure that even for people who are taking higher loan amounts have debt to income ration less than 1.
. The data set had nearly 114,000 loans from Nov 2005 - March 2014. After 2009 number of loans drastically increased. Prosper also changed its business model from 2009 and this might have attracted many borrowers.
. Before lenders used to determine borrower rate and now depending on credit risk prosper will fix interest rates. Many interesting insights can be drawn from this data. Initially, I was very confused by too many variables but as time progressed, I think I got some hang of these variables. It is also surprising to see that the purpose for which people are taking loans for has changed drastically over years.
. I think that a lot can be analyzed using this data like why some people are not able to pay loan on time, what is determining interest rates, what reasons are making people take loans and so on.
Tip: Here’s the final step! Reflect on the exploration you performed and the insights you found. What were some of the struggles that you went through? What went well? What was surprising? Make sure you include an insight into future work that could be done with the dataset.
Tip: Don’t forget to remove this, and the other Tip sections before saving your final work and knitting the final report!